The Adverse Event Unit (AEU): A novel metric to measure the burden of treatment adverse events

Objective To design a physician and patient derived tool, the Adverse Event Unit (AEU), akin to currency (e.g. U.S. Dollar), to improve AE burden measurement independent of any particular disease or medication class. Patients/Methods A Research Electronic Data Capture (REDCap) online survey was administered to United States physicians with board certification or board eligibility in general neurology, subspecialty neurology, primary care internal medicine or family medicine, subspecialty internal medicine, general pediatrics, and subspecialty pediatrics. Physicians assigned value to 73 AE categories chosen from the Common Terminology Criteria of Adverse Events (CTCAE) relevant to neurologic disorder treatments. An online forced choice survey was administered to non-physician, potential patients, through Amazon Mechanical Turk (MTurK) to weight the severity of the same AE categories. Physician and non-physician data was combined to assign value to the AEU. Surveys completed between 1/2017 and 3/2019. Results 363 physicians rated the 73 AE categories derived from CTCAE. 660 non-physicians completed forced choice experiments comparing AEs. The AEU provides 0–10, weighted values for the AE categories studied that differ from the ordinal 1–4 CTCAE scale. For example, CTCAE severe diabetes (category 4) is assigned an AEU score of 9. Although non-physician input changed physician assigned AEU values, there was general agreement among physicians and non-physicians about severity of AEs. Conclusion The AEU has promise to be a useful, practical tool to add precision to AE burden measurement in the clinic and in comparative efficacy research with neurology patients. AEU utility will be assessed in planned comparative efficacy clinical trials.


Introduction
There is increasing emphasis on adverse event(AE) burden in neurology as new treatments are approved [1][2][3]. AEs cost more than $136 billion per year and add an average of 5 days to neurological hospitalizations [4][5][6][7]. AEs are important to patients and represent a barrier to treatment adherence. When structuring neurological treatment paradigms, among medications with equal efficacy, treatment decisions will be dictated by differences in AE burden, treatment burden, and cost. We remain without a practical metric to measure AEs that facilitates comparison of medications within and across different classes based on AEs alone.
The Adverse Event Unit (AEU) is a physician and patient weighted consensus unit, akin to currency (e.g. US dollar), designed to quantify and compare AE burden over time. Unlike previous measures, the AEU facilitates AE measurement independent of any disease or medication class, in terms of a number of AEUs that can be compared over time [8][9][10][11][12][13]. AEU scores can be combined with other outcome metrics and quality of life scores to better define the differences among treatments in comparative efficacy trials and in the clinic. Understanding AE tolerance in different neurological conditions and AEU validation against other disease metrics is planned for future studies. This manuscript describes the derivation of the AEU and potential applications for this new tool.

Methods
Development of the AEU was designed as a two-phase protocol to obtain input from physician experts and potential patients. In the first phase, US physicians assigned weight to the severity of AE associated with treatments for neurological illnesses. In the second phase, non-physician potential patients recruited through the Amazon Mechanical Turk (MTurk) service (https:// www.mturk.com) rated the severity of the same group of AE. Data obtained from both phases was combined to generate value for the AEU. Surveys were completed between 1/2017 and 3/ 2019.

Standard protocol approvals, subject consent
The institutional review board (IRB) at the University of Vermont approved this protocol with a waiver of consent as all subjects were recruited anonymously through on-line surveys. Survey completion implied consent.

Physician subjects
United States physicians completed an on-line survey utilizing the secure Research Electronic Data Capture tool (REDCap) [14] hosted at University of Vermont. The target population was physicians with board certification or board eligibility in general neurology, subspecialty neurology, primary care internal medicine or family medicine, subspecialty internal medicine, general pediatrics, and subspecialty pediatrics. These specialties were chosen to capture the broad range of physicians who provide medical care for neurological patients.
Champions (MKH, TMB, DBA, KR, NK, AK, and ED) identified at US centers recruited colleagues in their communities and at other centers through a combination of targeted emails and in person meetings with groups of physicians. The American Academy of Neurology facilitated recruitment of current and previous physician recipients of the development award that supported the current study. All respondents were encouraged to forward the survey to colleagues in the aforementioned medical specialties.

Potential patient subjects
The online survey tool, MTurk, was used to recruit potential patients to represent a sample of the general population in the United States. MTurk is a viable and validated method to collect data about clinical and social science populations [15,16]. MTurk participants produced similar results when compared to in person university recruited populations in psychological surveys, behavioral tests, matched comparison groups, economic experiments, clinical studies, and social science studies [17][18][19][20]. In general, the MTurk participants tend to be of younger age. To sample a broad age range reflective of a typical neurology patient population, we stratified the surveys into the following available age cohorts: 25-30 years, 30-35 years, 35-45 years, 45-55 years, and greater than 55 years. The MTurk tool did not permit additional age stratification in the greater than age 55 years category. Subjects were paid $5 for survey completion. (Fig 1) Items for analysis. The investigators (TB and MH) chose 73 AE categories relevant to medications prescribed across the field of neurology from the Common Terminology Criteria of Adverse Events (CTCAE) version 4 for analysis (Appendix 1 in S1 File) [21]. The CTCAE is a physician expert derived, widely employed, ordinal [1][2][3][4][5], unweighted scale commissioned by the National Cancer Institute used to measure AE in many clinical trials [21]. Although AE severity increases along the CTCAE scale, items given the same value may not be of equal burden. For example, the AE of moderate hypertension (level 3), which carries the long-term risk of cardiovascular complications, is given the same score as a high fever of <24 hours duration (level 3). The CTCAE category 5 corresponding to death was not analyzed as we are interested in assigning value to AE that can be monitored over time while a patient undergoes treatment. The finite category of death can be measured independently without weighting because death from any cause is presumably of equal importance and consequence. Under the guidance of a board certified pediatrician (DA), items to measure congenital complications were adapted from the DSM-5 definitions of intellectual disability, an epilepsy research classification of congenital abnormalities, and neural tube defect classification systems [22][23][24][25].

Survey design and administration
Physician subjects. Each physician subject was asked to assign values (0 = no significance to 10 = most significant) to a random sample of 30 AEs within and across the chosen 73 CTCAE and congenital categories of varying severities. They were asked to consider each AE independent of any one disease or treatment. Subjects were also asked to factor scores they assigned both within and across AE categories as they rated AE in the survey (Appendix 2 in S1 File). A separate pediatrician survey included all the congenital malformation AE evaluated in addition to non-congenital AE. Adult physicians also rated congenital AE. Median scores with associated interquartile ranges were calculated to assign initial value to the AEU. This method of assigning weighted values to the AE categories identified from the CTCAE was adapted from method used by members of our research team (TB and MC) in the construction of the MG-Composite, a weighted, consensus, outcome measure validated for use in evaluating patients with myasthenia gravis [26].
Potential patient subjects. A subset of AE derived from the CTCAE and weighted by the physicians in phase 1 were converted into lay descriptions informed by Mayo Clinic descriptions of symptoms and medical conditions (https://www.mayoclinic.org/symptoms; https:// www.mayoclinic.org/diseases-conditions). Potential patient friendly versions of the CTCAE have been employed in other studies [27].
Potential patient subjects reviewed pairs of AE descriptions (Table 1 and Appendix 3 in S1 File) assigned different AEU values by the physician subjects. AE pairs to review were randomly computer generated so that the compared AEs were from different AE categories and had been assigned different scores by the physician subjects. In the style of a discrete choice experiment, subjects were asked "After reviewing each pair of AE, please choose which of the two AE would be least tolerable (i.e. the most severe of the pair)" [28,29]. They were also asked to consider: impact on quality of life (QOL), impact on life expectancy, future medical Table 1. Example of potential patient discrete choice.

Instructions:
In this survey, you will review information about potential medication associated adverse events. You will be provided with a description of pairs of potential medication associated adverse events. Consider the side effects alone without thinking of any particular medication or disease.
After reviewing each pair of adverse events, please choose which of the two adverse events would be least tolerable (i.e. the most severe of the pair).
Consider the following in making your decisions: Deep vein thrombosis (DVT) occurs when a blood clot forms in one or more of the deep veins in your body, usually in your legs. Deep vein thrombosis can cause leg pain or swelling, but also can occur with no symptoms. Deep vein thrombosis can be very serious because blood clots in your veins can break loose and lodge in your lungs, blocking blood flow (pulmonary embolism). Treatment of DVT includes anticoagulant medications (blood thinners) and in severe circumstances, placement of a filter in your blood vessels or treatment with a clot busting medication. Blood thinner treatment increases the risk for bleeding.
Complication: (AEU 7) �� You have developed a DVT in your leg as a result of medical treatment. No complications have occurred with this DVT, such as pulmonary embolism. You require treatment with a blood thinner for at least a few months. You may have been admitted to the hospital for a short time due to this issue.

Headache
Headaches may include syndromes that cause discomfort on the head including throbbing pain, stabbing pain, and numbness. Severe headaches may result in impaired physical and cognitive function. Severe headaches can impair daily function and may require treatment with medications. Drug induced headaches are likely to improve with stopping an offending medication.

Complication: (AEU 5) ��
You have developed a severe headache that limits routine daily activities and self-care as a result of medical therapy. This headache lasts less than 1 week, may require a short course of pain medication (such as ibuprofen), and improves with discontinuing the offending medication. No ongoing medical therapy is required.
�� Potential patient subjects were not shown the AEU value assigned by the physicians during the experiment. They are presented to illustrate that these AEs were given different weights by the physician subjects. https://doi.org/10.1371/journal.pone.0262109.t001

PLOS ONE
The adverse event unit (AEU): A novel metric to measure the burden of treatment adverse events complication risk, likelihood for AE resolution following therapy change, and other factors considered important to the subject. Potential patients were not told how the physicians weighted the AE being evaluated.
Combining physician and potential patient data. Bradley-Terry models were fit to the choices made by the potential patients using Firth's method for penalized maximum likelihood logistic regression in SAS version 9.4 [30,31]. The Bradley-Terry model uses the paired comparisons obtained through Mturk to estimate a set of 'less preferred' parameters for the AE. These parameters have the property that, if an AE with parameter A is compared to a second AE with parameter B, we would estimate that a proportion A/ (A+B) of potential patients would choose the first AE as less tolerable. Once 'less preferred' parameters from potential patient choices were estimated, we created integer scores by applying K-means clustering to

PLOS ONE
The adverse event unit (AEU): A novel metric to measure the burden of treatment adverse events the less preferred estimates, setting the number of clusters equal to 9 to match the range of integer scores provided by the physicians (2 to 10). Final adjustment to AEU scores was done by comparing the physician and the potential patient AEU scores. If the potential patient AEU score was greater than the physician assigned AEU, we increased the physician AEU across an entire AE category (e.g. hypertension) rating by 1. If the potential patient AEU score was less than the physician assigned AEU, we decreased the physician AEU across an entire AE category rating by 1. This method of combining physician and potential patients AEU scores was chosen to give weight to the expertise of physicians in understanding the overall short and long-term sequelae of the rated AEs. It is essential to arrive at single AEU scale to achieve the ultimate goal of developing a combined, easy to administer, best fit, weighted, consensus unit that would be feasible to administer in a clinical practice setting or clinical trial.

Physician subjects
The targeted medical specialties were well represented ( Table 2). The group was experienced and covered a wide geographic region. Recruited physicians had male and academic practice predominance. Primary care physicians practicing through university associated medical centers largely self-identified as in academic practice.

Potential patient subjects
The potential patient cohort represented the wide range of ages typical of neurology patients ( Table 2). The variables of geographic regions in the U.S. and sex were equally represented. Potential patients with college level of education or above were overrepresented. In addition, African Americans, Hispanic Americans, and Asian Americans were slightly underrepresented when compared to most recent U.S. Census data [32].

Phase 1: Physician weighting
Three hundred sixty three physicians provided data from 397 surveys; 34 physicians completed two different surveys with different sets of AE. On a 0-10 scale (0 = no importance and 10 = maximal importance), physician responses ranged from 2-10 across the 73 AE categories evaluated. Median values with interquartile ranges are available in Appendix 1 in S1 File. In many circumstances, the weighted values provided by the physicians did not match the rigid 1-4 ordinal CTCAE scale. For example, the CTCAE category 1, corresponding to a mild AE for pulmonary fibrosis, received an AEU score of 6. The CTCAE category 4, corresponding to a severe AE for diabetes, received an AEU value of 9. In contrast, the severe CTCAE category 4 for headache received an AEU value of 6, similar to the AEU values assigned for CTCAE category 2 diabetes and CTCAE category 1 for pulmonary fibrosis.

Phase 2: Potential patient forced choice
Each of 660 MTurk potential patient raters made 20 random paired AE discrete choice comparisons. Two sets of comparisons, presented to 20 participants each, could not be used because the computer randomly assigned items with same initial physician derived AEU score. These two sets of comparisons did not allow the participants to distinguish the choices, leaving 11,463 comparisons for analysis. All 73 AE categories were used in at least one paired comparison. Appendix Table 4 in S1 File provides estimates and standard errors for the logistic regression parameters estimated using Firth's method as well as a calibration plot. The model has excellent discrimination (c-index = 0.866) and calibration. Appendix 5 in S1 File shows the results of the K-means clustering used to assign integer scores to the preference parameters. Subsequent analyses adjusted the preference parameters for demographic characteristics, age, sex, race/ethnicity, education and region of the country and of the mTurk respondents, but none of the characteristics were statistically significant, and more importantly, did not change the final ratings. Adjustment for the demographic characteristics altered the final rating in only 3 of the 73 items, and never by more than 1 point. Given the additional complexity in interpreting the results with additional covariates, we present the final ratings based on the model without adjusting for demographic characteristics. The AE evaluated by the potential patients ranged from 2-10 AEU points on the scale generated by the physicians and the Bradley Terry method (Table 3). Severity choice values ranged from 0.33 for a mild degree of diarrhea (physician AEU 3) to 8.5 for treatment related malignancy (physician AEU 9).

Phase 3: Combining physician and potential patient values
Final physician and potential patient combined AEU scores are presented in Table 4. Fifty-five of the 73 items were adjusted from the originally assigned physician scores to reflect input from the potential patients (Table 3). In three categories (hallucinations, dyskinesia, and thrombosis), the physician assigned AEU value of a more severe adverse event in a category had a lower score than the immediately preceding AE. For example, moderate hallucinations

PLOS ONE
The adverse event unit (AEU): A novel metric to measure the burden of treatment adverse events

PLOS ONE
The adverse event unit (AEU): A novel metric to measure the burden of treatment adverse events

PLOS ONE
The adverse event unit (AEU): A novel metric to measure the burden of treatment adverse events

PLOS ONE
The adverse event unit (AEU): A novel metric to measure the burden of treatment adverse events

PLOS ONE
The adverse event unit (AEU): A novel metric to measure the burden of treatment adverse events

PLOS ONE
The adverse event unit (AEU): A novel metric to measure the burden of treatment adverse events

PLOS ONE
The adverse event unit (AEU): A novel metric to measure the burden of treatment adverse events

PLOS ONE
The adverse event unit (AEU): A novel metric to measure the burden of treatment adverse events

PLOS ONE
The adverse event unit (AEU): A novel metric to measure the burden of treatment adverse events

PLOS ONE
The adverse event unit (AEU): A novel metric to measure the burden of treatment adverse events

PLOS ONE
The adverse event unit (AEU): A novel metric to measure the burden of treatment adverse events

PLOS ONE
The adverse event unit (AEU): A novel metric to measure the burden of treatment adverse events

PLOS ONE
The adverse event unit (AEU): A novel metric to measure the burden of treatment adverse events

Encephalocele:
Protrusion of brain and/ or meninges through skull. Covered by skin. AND May be compatible with survival but with likely disability.

Anencephaly:
Open deficit, cranial tube exposed. Not compatible with survival.

PLOS ONE
The adverse event unit (AEU): A novel metric to measure the burden of treatment adverse events

PLOS ONE
The adverse event unit (AEU): A novel metric to measure the burden of treatment adverse events had an AEU score of 6 and Severe Hallucinations/Medical Intervention Indicated (Hospitalization Not Indicated) had an AEU score of 5. This likely occurred as physicians were not shown the full range of side effects in each category when assigning scores. We did not show physicians all the AEs in a category to reduce survey burden and to prevent bias from being shown a group of side effects in a previously determined, ordinal fashion. In these three circumstances after discussion among the investigators, the decision was made to rate both categories with the higher AEU score prior to obtaining potential patient input; e.g. both hallucination categories in question have a final AEU score of 6. This decision is also supported by overlap in the interquartile range of these categories in the original physician subject weighted values.

Discussion
In the age of precision medicine, well-designed, practical outcome measures and decision support tools expand the data we track about patients to better inform medical decisions [33,34]. Unlike previous measures, the physician and potential patient derived AEU quantifies AE burden in a common currency independent of any disease or medication class, that can be compared among different medications over time. The AEU may facilitate movement from more gestalt AE burden measurement to more precise AE burden measurement, enriching treatment discussions between patients and physicians. Individual patients and physicians may not value AE in the same way, e.g. patients with more severe conditions such as cancer, may tolerate a higher burden of AE. As a consensus metric, the AEU is not designed to be an absolute measure of burden and distress for any particular patient but rather a way to keep the AE burden score. The AEU is designed to best estimate the market price of specific AEs, similarly to how the price is set for a good or service, e.g. $10 for a basketball and $30,000 for a car. Consumers decide if they are willing to pay consensus prices for these goods. Similarly, patients can be given AEU scores corresponding to the number and type of AEs they develop on a given therapy. In combination with measures of disease improvement, financial burden, overall QOL, severity of a patient's medical condition, patient age, and other factors unique to a particular patient, patients can decide whether they are willing to tolerate a specific AEU burden when making treatment decisions with their physician. Future validation projects, like one underway in a population of patients with myasthenia gravis, will attempt to understand clinically meaningful differences in AEU score over time for different patient populations.
Attempts have been made to develop disease and medication specific measures of AE [8][9][10][11][12][13]. Disease and medication specificity limit broad applicability. Quality Adjusted Life Year (QALY) is a useful measure of population cost effectiveness of varied treatments [35,36]. Since the QALY encompasses all aspects of health, financial cost, and QOL, it cannot measure AE burden alone. As a population based tool, the QALY is a less practical way to measure treatment burden in a comparative efficacy trial or in the clinic.
The CTCAE is a medication independent, physician derived AE measurement tool [21]. Due to lack of weighting and patient input, it provides only granular AE burden measurement. We built the AEU based on the strengths of the CTCAE. The diverse physician group incorporated a wide range of opinions about AE impact on overall health accounting for both current effects (e.g. joint pain) as well as future secondary consequences (e.g. stroke due to new diabetes) to assign AEU values. Although all physicians surveyed could rate congenital complication AEs, all pediatricians surveyed weighted these items as they care for impacted children.
The AEU incorporates potential patient opinions in assigning AE burden values. The use of potential patients rather than patients with particular diagnoses allows AE burden to be scored independent of any particular disease or medication. While we were not able to stratify the sample by whether MTurk respondents were parents, many subjects who rated congenital AEs self-identified as parents in the comment section. Since MTurk doesn't permit stratification by ethnicity, some groups were slightly underrepresented in our sample. We observed even representation of U.S. geographic regions. Utilizing MTurk, we obtained hundreds of opinions within days of survey release. Although MTurk introduced bias due to requirement of basic computing skills, it reduced other bias, including the selection bias of clinicians when choosing patients for participation in research. We found recruitment through this online tool to be a logistical and cost-effective strategy to easily obtain opinions from large samples. This method has the potential to be a powerful method for studies like this one and to obtain preliminary data for clinical study design while reducing the inherent bias of the small focus group method.
We believe the weighted consensus AEU values provides a more complete measurement of AE burden. A CTCAE category 1 is often classified as mild [37]. However, all CTCAE categories across different AEs are not of equal value and were not weighted the same among our cohort. A CTCAE grade 1 may not reflect a good outcome in all circumstances. For example, CTCAE Grade 1 pulmonary fibrosis, received an AEU score of 7 and was rated the same as CTCAE Grade 4 osteoporosis (Table 4). We also believe the AEU's independence of any particular disease or medication class is essential to allow comparison of treatments across medication classes. For example, prednisone and IVIG, treatments with different AE profiles, could be compared by the AEU in patients with myasthenia gravis.
Although 75% of AE categories required final AEU value adjustment when physician and potential patient values were combined, only 12% of items had a rating difference of 3 or more points between physicians and potential patients (Table 4). This suggests that while there is difference in physician and potential patient opinions on AE severity, there appears to be general agreement among the groups. Use of the Bradley-Terry paired comparisons model was a useful way to put the physician ratings, collected as scores, and the MTurk ratings, collected as a sequence of paired comparisons, on the same scale. Although physician opinions anchored AEU values, potential patient opinions were incorporated via the discrete choice surveys. We believe adjusting the AEU score to incorporate opinions of both groups strengthens the future applicability of this tool. In practice, patients often rely on physician expert caregivers to guide medical decisions.
We believe the AEU has great promise to be a useful, practical tool to add precision to AE burden measurement in the clinic and in comparative efficacy research for neurology patients. Future studies may show the AEU to be useful in other medical specialties. In comparative efficacy research, we anticipate that AE burden of drugs from different classes can be compared by AEU burden. Assigning an AEU score over time will account for more transient AEs that drop out over time (e.g. single headache) and more persistent AEs (e.g. new hypertension). The AEU score can be combined with other disease specific outcome metrics and QOL metrics to measure differences among medications over time. Evaluation of the validity, utility, and value of the AEU in comparative efficacy trials in myasthenia gravis and other neurological disorders is under way. If the AEU is useful in these studies, translation of some or all of the other items in the CTCAE could be performed to generalize the AEU to other medical subspecialties.